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This  year  we  did  not  use  ClueWebl2  or  ClueWebl2-B,while  we  solve  this  issue  based  on  data  we 
crawled  from  openweb.Firstly,  we  use  external  structured  resource  -Google  Place  API[1]  to  find  all  of 
the  possible  candidate  places  in  the  distance  of  5  hours'  drive.  Secondly  we  use  Yandex[2]  to  find  a 
description  of  each  place  because  we  get  URL  of  the  corresponding  place.  Then,  we  classifythe 
descriptions  of  all  places.  Finally,  we  ranked  the  pages  based  on  users’  preferences. 

1  Data  Preparation 

Google  Place  API  is  used  to  generate  candidate  places  for  each  query.  We  get  pages  from  Google 
Place  API  and  get  descriptionsof  them  from  Yandex.  It  is  a  challengeto  get  enough  candidate  places  due 
to  the  visiting  limitation  of  API.  Another  challenge  is  that  Google  Place  API  does  not  return  enough 
candidate  places  for  every  specific  geographical  position.  So  the  places  we  got  from  the  API  of  current 
round  are  used  as  seeds  to  get  more  nearby  places  in  the  next  round.  After  several  rounds,  we  remove 
the  duplicate  places  and  get  the  candidate  set. 

2  Query  Generation 

Queries  are  the  basics  of  our  scheme.  We  want  to  build  good  queries  for  the  search  engine  to 
return  us  with  good  suggestions. Each  query  consists  of  base  part  and  keyword  part.The  base  part  is 
geographical  information  providing  the  longitude  and  latitude.  In  the  base  part  we  use  Google  Places  to 
generate  basic  candidates.  Then  with  the  help  of  Yandex  we  can  almost  get  the  brief  description  of  each 
item.  By  mnningprocedure  “Ida”  we  generate  a  query  vector  based  on  LDA  model.  Each  candidate 
description  will  generate  a  topic  vector.  By  mnningprocedure  “cat”,  the  query  is  generated  by  mapping 
every  example  into  a  vector  of  10  topics,  which  is  done  manually.  Then  for  each  user  an  average  vector 
of  all  the  examples  rated  5  or  4  is  generated. 

3  Schemel 

Procedure(RUN)  “cat”  consists  of  two  steps.  The  first  step  is  generating  users  topic  vectors 
manually.  At  first  we  map  each  example  to  a  10-dimension  vector,  then  a  user’s  profile  vector  is 
defined  as: 
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Pj  is  the  j-th  dimension  of  user’s  profile  vector,  n  is  the  number  of  examples  rated  4  or  5  by  the  user. 

In  this  method  we  chose  a  ten  dimension  feature  vector  to  represent  a  place.  Every  dimension  is  a 
type  information  that  we  think  it’s  important  when  we  do  suggestions.  Then  we  got  type  information 
for  every  candidate  place  from  Google  Place  API.  But  these  type  tags  from  API  are  a  little  different 
from  our  dimensions  of  feature  vector.  So  we  convert  these  class  tags  into  our  feature  vector  by  define 
the  weight  that  map  the  class  tag  to  every  dimension  of  feature  vector.  We  calculate  each  dimension  of 
a  candidate  place  by  the  following  equation: 
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Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
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/y  is  the  j-th  dimension  of  the  item’s  profile  vector,  m  is  the  number  of  sub  class  which  is  labeled  on  the 
item.Then  with  the  user’s  profile  vector  and  the  item’s  topic  vector,  we  can  compute  a  score  of  how 
much  does  an  user  like  a  candidate  item: 
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Scoreuitem  is  the  score  of  how  u(user)  like  item(item).  P" is  the  i’th  dimension  of  user’s  profile 
vector  P,  /!tem  is  the  i’th  dimension  of  item  vector. 

For  a  user,  we  compute  all  the  score  of  candidates,  then  we  recommend  the  top  K  items  to  the 


user. 


4  Schemell 


Procedure(RUN)  “Ida”  applies  Topic  Model  to  users’  interests  and  types  of  places.  We  use 
descriptions  of  places  as  the  input  of  LDA.  It  means  that  we  regard  each  description  of  a  place  as  a 
document.  We  want  to  find  the  latent  type  distribution  of  places.  Sowe  use  LDA  to  find  latent  sematic 
vector  for  every  place  to  represent  its  text  descriptions.  In  some  degree,  it  looks  like  to  reduce  the 
dimensions  of  the  original  places’  description. 

After  running  LDA  model[3],  we  get  a  vector  for  every  place.  For  each  query,  we  know  the  users’ 
visited  places  and  the  candidate  set.  We  notate  the  user’s  visited  place  set  as  V.  We  use  rate(u,p )  to 
represent  the  rate  given  by  user  u  to  place  p.Then  we  calculate  the  score  of  the  i-th  candidate  place  for 
user  u  in  the  following  equation: 


Score(u,  p,  t)  =  ^  rate(u,  p) 
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In  the  formulation  above,  we  use  cosine  distance  to  represent  the  similarity  of  two  vectors.  Thus  we  get 
the  scores  of  candidate  places.  Then  we  sort  these  places  by  their  scores  in  descending  order  and 
recommend  the  top  50  places  to  user  when  he  is  in  the  corresponding  geographical  position. 


5  Summary 

In  our  schemes,  we  firstly  used  Google  Place  API  to  generate  candidate  set  for  each  specified 
longitude  and  latitude.  Then  we  tried  human  annotation  or  Latent  Topic  Model  to  find  the  feature 
vector  of  each  place  which  represents  its  type  information.  Our  idea  is  based  on  the  fact  that  the  user 
would  like  to  visit  the  places  with  the  same  type  that  he  had  ever  visited.  For  instance,  if  you  had 
visited  parks  for  several  times,  then  you  will  have  much  more  chance  to  visit  parks  in  your  nearby 
places.  So  in  both  Scheme  I  and  Scheme  II,  we  try  to  use  human  annotations  or  Latent  Topical  Model 
to  find  type  information  of  places  and  users’  interests. 

After  analyzing  the  results  of  our  model,  we  found  that  there  were  some  points  we  need  to  be 
improve.  First  we  should  get  more  candidate  places  for  each  specific  geographical  location.  It  means 
that  we  should  make  our  candidate  set  much  bigger.  Because  of  the  access  limitation  of  Google  Place 
API,  we  didn’t  fetchenough  candidate  places  nearby  the  specified  position.  Second  we  should  model 
the  type  of  places  in  a  finer  grained  way.  More  parameters  should  be  trained  and  tested  for  LDA  model. 
Thus  we  can  get  much  accurate  latent  semantic  type  for  each  place.  Then  the  representation  of  place 
will  be  much  more  useful.  The  same  improvement  can  also  be  made  in  Schemel. 

To  summarize,  we  did  context  suggestion  by  modeling  interests  of  user  and  types  of  places.  So  the 
more  accurate  the  type  of  every  place  is,  the  better  suggestions  will  be  made. 
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